Goto

Collaborating Authors

 regularity condition


Deep Neural Networks for Doubly Robust Estimation with Nonprobability Survey Samples

arXiv.org Machine Learning

Integrating probability and nonprobability survey samples is an important problem in modern survey sampling. Nonprobability samples often contain rich outcome information but may lack population representativeness, whereas probability samples provide design-based auxiliary information but may not contain the study variable. We propose a deep neural network (DNN)-assisted doubly robust framework for estimating the finite population mean from these two data sources. The proposed method models the logit sampling score for the nonprobability sample as an unknown nonparametric function and estimates it by maximizing a pseudo-likelihood that combines information from the nonprobability sample and a reference probability sample. The DNN parameters are optimized using the ADAM algorithm. The resulting DNN-estimated sampling scores are incorporated into a DNN-assisted inverse-probability weighted estimator and a deep doubly robust estimator. We establish consistency and convergence rates under regularity conditions and evaluate the finite-sample performance of the proposed estimators through simulation studies and an empirical application using Pew Research Center and Behavioral Risk Factor Surveillance System data. The results suggest that the proposed estimators can improve robustness to parametric propensity-score misspecification, especially when the true selection mechanism is nonlinear.


Fast Convergence of Policy Regret in Learning Stochastic Optimal Control

arXiv.org Machine Learning

Policy learning in modern operations environments faces a fundamental tension between limited operational data and the large, often continuous, state and action spaces over which good decisions must be identified and deployed. We study value-based policy learning in stochastic optimal control: a greedy policy induced by an estimate of the optimal action-value function $Q^*$ is deployed, and its performance is measured by regret. The empirical success of this approach calls for statistical insight into the structures that enable fast regret convergence. We show that, in continuous action spaces, fast policy learning is induced by three geometric structures: a growth exponent $p$, which quantifies how quickly $Q^*$ separates suboptimal actions from its maximizers; a margin-mass exponent $m$, which controls how much deployment mass lies on states with weak growth; and an action-wise regularity exponent $q$, which measures the smoothness of the $Q^*$-estimation error across actions. Given a $n^{-1/2}$-accurate estimator of $Q^*$, we show that the minimax-optimal policy regret convergence rate is \[ \widetildeฮ˜\left( n^{-\min\left\{\frac{p}{2(p-q)},\frac{m+1}{2m}\right\}} \right), \] up to a logarithmic factor at the boundary between the two regimes. The exponent $q$ is crucial: $q>0$ yields faster-than-$n^{-1/2}$ regret. This regime is natural in operations applications. In particular, we verify $q>0$ under mild regularity conditions in dynamic inventory control and service allocation examples, while the mechanism underlying this fast rate regime extends beyond these settings.


Sample efficient inductive matrix completion with noise and inexact side information

arXiv.org Machine Learning

Low-rank matrix completion is a widely studied problem with many variants. Inductive matrix completion (IMC) incorporates row and column side information to significantly narrow the search space. Prior work falls into two regimes: methods that exploit this structure to achieve reduced sample complexity but only in noiseless settings, and methods that handle noise but require sample complexity matching the ambient matrix dimension, forfeiting the sample efficiency that side information should provide. In this paper, we close this gap by studying noisy IMC with a nonconvex projected gradient descent algorithm with spectral initialization. Our main technical contribution is establishing a regularity condition for the IMC loss function that holds at the reduced sample complexity determined by the effective problem size, scaling with the side information dimension a rather than the ambient dimension n. This directly yields linear convergence and an estimation error that both depend only on the effective problem size rather than the ambient matrix dimension. We further extend our analysis to the inexact side information setting, demonstrating that the reduced sample complexity is maintained and the estimation error is order-optimal with respect to the inexactness of the side information. Extensive simulations and real-world experiments on the MovieLens dataset validate our theoretical findings.


Appendix AToy example

Neural Information Processing Systems

In this section, we provide and expand upon a toy example. Recall that the inputs x and x0 need not correspond to real users but could instead represent hypothetical users. Example 5. Suppose that the regulatory guideline requires that users in the same geographical location receive similar weather forecasts. This can be written as "the weather forecasts that are selected by F should be similar for all users in the same geographical location", and S could be a randomly generated set of user pairs, where each pair corresponds to two (hypothetical) users in the same geographical location, and S could contain pairs across many locations. In the left-most panel, a filtering algorithm F takes in counterfactual inputs x and x0 and produces the content Z and Z0. Because a counterfactual regulation requires that F behave similarly under x and x0, the regulation is effectively requiring that content Z and Z0 are sufficiently similar (or, graphically, that they are close in Z). The question of how to quantify "similarity" is addressed in Section 2.1. The toy example in Example 5 is illustrated in the right-most panel.


Dimensionally Tight Bounds for Second-Order Hamiltonian Monte Carlo

Neural Information Processing Systems

Hamiltonian Monte Carlo (HMC) is a widely deployed method to sample from high-dimensional distributions in Statistics and Machine learning. HMC is known to run very efficiently in practice and its popular second-order ``leapfrog implementation has long been conjectured to run in $d^{1/4}$ gradient evaluations. Here we show that this conjecture is true when sampling from strongly log-concave target distributions that satisfy a weak third-order regularity property associated with the input data. Our regularity condition is weaker than the Lipschitz Hessian property and allows us to show faster convergence bounds for a much larger class of distributions than would be possible with the usual Lipschitz Hessian constant alone. Important distributions that satisfy our regularity condition include posterior distributions used in Bayesian logistic regression for which the data satisfies an ``incoherence property. Our result compares favorably with the best available bounds for the class of strongly log-concave distributions, which grow like $d^{{1}/{2}}$ gradient evaluations with the dimension. Moreover, our simulations on synthetic data suggest that, when our regularity condition is satisfied, leapfrog HMC performs better than its competitors -- both in terms of accuracy and in terms of the number of gradient evaluations it requires.


A New Kernel Regularity Condition for Distributed Mirror Descent: Broader Coverage and Simpler Analysis

arXiv.org Machine Learning

Existing convergence of distributed optimization methods in non-Euclidean geometries typically rely on kernel assumptions: (i) global Lipschitz smoothness and (ii) bi-convexity of the associated Bregman divergence function. Unfortunately, these conditions are violated by nearly all kernels used in practice, leaving a huge theory-practice gap. This work closes this gap by developing a unified analytical tool that guarantees convergence under mild conditions. Specifically, we introduce Hessian relative uniform continuity (HRUC), a regularity satisfied by nearly all standard kernels. Importantly, HRUC is closed under concatenation, positive scaling, composition, and various kernel combinations. Leveraging the geometric structure induced by HRUC, we derive convergence guarantees for mirror descent-based gradient tracking without imposing any restrictive assumptions. More broadly, our analysis techniques extend seamlessly to other decentralized optimization methods in genuinely non-Euclidean and non-Lipschitz settings.